Video based hand gesture recognition dataset using thermal camera

The dataset includes thermal videos of various hand gestures captured by the FLIR Lepton Thermal Camera. A large dataset is created to accurately classify hand gestures captured from eleven different individuals. The dataset consists of 9 classes corresponding to various hand gestures from different people collected at different time instances with complex backgrounds. This data includes flat/leftward, flat/rightward, flat/contract, spread/ leftward, spread/rightward, spread/contract, V-shape/leftward, V-shape/rightward, and V-shape/contract. There are 110 videos in the dataset for each gesture and a total of 990 videos corresponding to 9 gestures. Each video has data of three different (15/10/5) frame lengths.


Specifications Table
This section lists the details of the hardware, the procedure used for collecting the data followed by the format of the data.

Value of the Data
• The dataset is useful for developing machine learning to classify and recognize different video-based hand gestures more efficiently.• The dataset will help computer vision researchers in developing machine learning algorithms for proper classification and recognition of hand gestures.• The data can be used to create and test new algorithms for video-based hand gesture recognition.• To incorporate many possible variations in the dataset, data is collected for many different hand gestures at different time instances.

Background
The majority of datasets available in the literature are captured with an RGB camera.NUS hand digit dataset [1] and ASL Finger Spelling dataset [2] are two of the most commonly used RGB datasets.However, the RGB cameras perform poorly in a variety of lighting conditions [3,4] .Motivated by this, a thermal imaging dataset has been published in [5] for sign language digits.However, the dataset published in [5] is captured using a low-resolution thermal camera of 32 × 32 pixels resolution.To address this, a high-resolution dataset has been published in [6] .The dataset in [6] has been collected using a high-resolution thermal camera of 160 × 120 pixels.The above collected thermal datasets are image-based.Thus, in this work, a video-based hand gesture dataset of 9 classes.

Data Description
The dataset contains the video frames captured from our thermal camera.The frames are captured from eleven individuals for different hand forms and shapes.The gestures are the same for all 11 individuals.Further, the data is captured by placing the hands at different distances from the thermal camera up to a maximum of 3 meters.The total dataset has been divided into two sections: Classes and takes.

Data file description
The layout of the data repository is depicted in Fig. 1 .The root folder contains one folder which is divided into 9 folders for each hand gesture.Each hand gesture folder contains video frames captured by 11 different people who did the hand gesture 10 times with different distances and positions in the frame.
Fig. 2 depicts the complete set of thermal frames captured from the start to the end of the hand position.

Experimental Design, Materials and Methods
We used a Thermal camera FLIR Lepton (Lepton 2.5, 80 × 60 , 50 ×, radiometric with shutter) module as shown in Fig. 3 to capture the hand gestures of an individual.The FLIR Lepton is a radiometric-capable LWIR OEM camera solution that is less than a dime in size, fits inside a smartphone, and costs one-tenth the price of traditional IR cameras.Lepton uses focal plane arrays of either 160 × 120 or 80 × 60 active pixels.Every pixel of each image is accurately, calibrated, and noncontact temperature data is captured by the radiometric Lepton [7] .Fig. 4 a shows the setup for capturing the thermal images.Since there is no USB port on the thermal camera, the camera is fitted on purethermal 2 breakout board [8,9] that has micro USB to connect to the computer through USB.We 3D printed a USB housing for the thermal camera and an adjustable camera stand to mount the camera to the tripod [10,11] and be able to maintain proper stability.Fig. 4 b also shows that we had to use electrical tape and strips to keep the 3D printed parts, the tripod, and the camera together.To collect and save the data, the

Fig. 2 a
to d show the frames corresponding to the class 1 Flat/Leftward hand gesture position.Fig. 2 e to h show the frames corresponding to the class 2 Flat/Rightward hand gesture position.Fig. 2 i to l show the frames corresponding to the class 3 Flat/Contract hand gesture position.Fig. 2 m to p show the frames corresponding to the class 4 Spread/Leftward hand gesture position.

Fig. 2 q
to t show the frames corresponding to the class 5 Spread/Rightward hand gesture position.

Fig. 2 u
to x show the frames corresponding to the class 6 Spread/Contract hand gesture position.Fig. 2 y to ab show the frames corresponding to the class 7 Vshape/Leftward hand gesture position.Fig. 2 ac to af show the frames corresponding to the class 8 Vshape/Rightward hand gesture position.Fig. 2 ag to aj show the frames corresponding to the class 9 Vshape/Contract hand gesture position.

Fig. 4 .
Fig. 4. (a) Setup from the back, we can see that the tripod is leveled and (b) Final camera setup.